Dynamic Load Balancing in Stream Processing Pipelines Containing Stream-Static Joins
نویسندگان
چکیده
Data stream processing systems are used to continuously run mission-critical applications for real-time monitoring and alerting. These require high throughput low latency process incoming data streams in real time. However, changes the distribution of over time can cause partition skew, which is defined as an unequal partitions among workers, resulting sub-optimal due unbalanced load. This paper presents first solution designed specifically address skew context joining streaming static data. Our uses state-of-the-art principles monitor load, detect load imbalance, dynamically redistribute partitions, achieve optimal balance. To accomplish this, our leverages collocation data, while considering join subsequent operations. Finally, we present results experimental evaluation, compared four pipelines containing such a join. The show that achieved significantly higher lower than competing approaches.
منابع مشابه
Adaptive Load Diffusion for Stream Joins
Data stream processing has become increasingly important as many emerging applications call for sophisticated realtime processing over data streams, such as stock trading surveillance, network traffic monitoring, and sensor data analysis. Stream joins are among the most important stream processing operations, which can be used to detect linkages and correlations between different data streams. ...
متن کاملStatic Optimisation vs. Dynamic Evaluation for Data Stream Processing
The work presented in this dissertation offers the quantitive comparison between two different execution frameworks for queries over data streams. The fist framework is the static one. Its optimiser decides the execution plan, and it orders the operators according to it. Then, it schedules the incoming data through these operators. The plan is fixed and it cannot change throughout the processin...
متن کاملStream-processing pipelines: processing of streams on multiprocessor architecture
In this paper we study the timing aspects of the operation of stream-processing applications that run on a multiprocessor architecture. Dependencies are derived for the processing and communication times of the processors in such a system. Three cases of real-time constrained operation and four cases of communication organization are considered and compared. Examples of application are given fo...
متن کاملPMJoin: Optimizing Distributed Multi-way Stream Joins by Stream Partitioning
In emerging data stream applications, data sources are typically distributed. Evaluating multi-join queries over streams from different sources may incur large communication cost. As queries run continuously, the precious bandwidths would be aggressively consumed without careful optimization of operator ordering and placement. In this paper, we focus on the optimization of continuous multi-join...
متن کاملApproximate Data Stream Joins in Distributed Systems
The emergence of applications producing continuous high-frequency data streams has brought forth a large body of research in the area of distributed stream processing. In presence of high volumes of data, efforts have primarily concentrated on providing approximate aggregate or top-k type results. Scalable solutions for providing answers to window join queries in distributed stream processing s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Electronics
سال: 2023
ISSN: ['2079-9292']
DOI: https://doi.org/10.3390/electronics12071613